This report’s objective is to investigate patterns in shark catch counts according to species, location, month, and length found in data from the Shark Control Program.
The main discoveries are:
The data came from the Agriculture & Fisheries Department of the State of Queensland. The dataset contains data from each day of 2016 and was collected as part of the Queensland Shark Control Program, a Government initiative to prevent shark bites.
The data is valid because it came from an official Government source
Possible issues include:
Potential stakeholders include marine biologists, government officials, tourists, etc.
Each row represents each shark caught and its respective data.
Each column represents fields of data about the shark caught. These are Species Name, Date, Area, Location, Latitude, Longitude, Length(m), Water Temperature(C), Month, Day of Week.
The key variables are:
## read in data
Sharks = read.csv("sharks.csv")
Species = Sharks$Species.Name
Species_Length = Sharks$Length..m.
Month = Sharks$Month
Area = Sharks$Area
Length = Sharks$Length..m.
Temperature = Sharks$Water.Temp..C.
Classification of variables
## show classification of variables
str(Sharks)
## 'data.frame': 532 obs. of 10 variables:
## $ Species.Name : chr "AUSTRALIAN BLACKTIP" "BLACKTIP REEF WHALER" "BLACKTIP REEF WHALER" "BLACKTIP REEF WHALER" ...
## $ Date : chr "2016-11-16" "2016-01-02" "2016-01-02" "2016-01-05" ...
## $ Area : chr "Cairns" "Cairns" "Cairns" "Mackay" ...
## $ Location : chr "Holloways Beach" "Buchans Point Beach" "Ellis Beach" "Harbour Beach" ...
## $ Latitude : chr "-16°49.82" "-16°43.56" "-16°43.3" "-21°7.08" ...
## $ Longitude : chr "145°44.85" "145°39.78" "145°39.01" "149°13.62" ...
## $ Length..m. : num 1 0.7 1.5 2.2 1.7 1.2 0.75 1.2 0.8 1.3 ...
## $ Water.Temp..C.: int 27 27 27 26 26 29 30 31 29 29 ...
## $ Month : chr "November" "January" "January" "January" ...
## $ Day.of.Week : chr "Wednesday" "Saturday" "Saturday" "Tuesday" ...
Dimensions of data
## show the dimensions of data
dim(Sharks)
## [1] 532 10
We aim to analyse that which shark species was caught the most in the Shark Control Program in 2016. We achieve this by producing an interactive bar plot which represents the number of sharks caught for each species.
## write code here
library(tidyr)
library(dplyr)
##
## Attaching package: 'dplyr'
## The following objects are masked from 'package:stats':
##
## filter, lag
## The following objects are masked from 'package:base':
##
## intersect, setdiff, setequal, union
library(ggplot2)
library(plotly)
##
## Attaching package: 'plotly'
## The following object is masked from 'package:ggplot2':
##
## last_plot
## The following object is masked from 'package:stats':
##
## filter
## The following object is masked from 'package:graphics':
##
## layout
g<- ggplot(Sharks, aes(x=Species )) + geom_bar(fill="tomato3") + theme(axis.text.x = element_text(angle=65, vjust = 0.5)) + scale_fill_brewer() + ylab("Count") + ggtitle("Shark catch statistics by species")
ggplotly(g)
Summary: On producing this bar plot, we observe that TIGER SHARK is the most caught shark species with a catch count of 207 followed by followed by BULL WHALER at 91 and COMMON BLACKTIP WHALER at 39.
We now aim to find the median length of the top three most caught species and compare them with each other.
## write code here
median(Sharks$Length..m.[Sharks$Species.Name == "TIGER SHARK"])
## [1] 2.37
median(Sharks$Length..m.[Sharks$Species.Name == "BULL WHALER"])
## [1] 1.43
median(Sharks$Length..m.[Sharks$Species.Name == "COMMON BLACKTIP WHALER"])
## [1] 1.06
We can now compare the median length of the species using box plots.
## write code here
par(mfrow=c(1,3))
## For TIGER SHARK
boxplot(Sharks$Length..m.[Sharks$Species.Name == "TIGER SHARK"], col="orange",las=1,main = "TIGER SHARK", ylab = "Length")
## For BULL WHALER
boxplot(Sharks$Length..m.[Sharks$Species.Name == "BULL WHALER"], col="pink",las=1,main = "BULL WHALER", ylab = "Length")
## For COMMON BLACKTIP WHALER
boxplot(Sharks$Length..m.[Sharks$Species.Name == "COMMON BLACKTIP WHALER"], col="yellow",las=1,main = "COMMON BLACKTIP WHALER", ylab = "Length")
Summary: Of all the top three most caught species, TIGER SHARK has the highest median length of 2.37 m.
We want to figure out which month had the most sharks caught. For this, we create an interactive bar plot and sort the months based on the number of sharks caught in descending order.
## write code here
y<- ggplot(Sharks, aes(x=reorder(Month, Month, function(x) - length(x)))) + geom_bar(fill="dark blue") + xlab("Months") + ylab("Count") + ggtitle("Shark catch statistics by month")
ggplotly(y)
Summary: Based on this bar plot, we can conclude that February had the highest number of shark catches, with a shark count of 70, followed by May and January. We can relate it with an ABC Science article titled, “Are we seeing more sharks than usual at this time of year?”.
We now want to find out the shark catch statistics by area using a similar approach.
## write code here
x <- ggplot(Sharks, aes(x = Area)) +geom_bar() + geom_bar(fill="green") +theme(axis.text.x = element_text(angle=65, vjust = 0.5)) + ggtitle("Shark catch statistics by area") + ylab("Count")
ggplotly(x)
## Warning: 'bar' objects don't have these attributes: 'mode'
## Valid attributes include:
## '_deprecated', 'alignmentgroup', 'base', 'basesrc', 'cliponaxis', 'constraintext', 'customdata', 'customdatasrc', 'dx', 'dy', 'error_x', 'error_y', 'hoverinfo', 'hoverinfosrc', 'hoverlabel', 'hovertemplate', 'hovertemplatesrc', 'hovertext', 'hovertextsrc', 'ids', 'idssrc', 'insidetextanchor', 'insidetextfont', 'legendgroup', 'legendgrouptitle', 'legendrank', 'marker', 'meta', 'metasrc', 'name', 'offset', 'offsetgroup', 'offsetsrc', 'opacity', 'orientation', 'outsidetextfont', 'selected', 'selectedpoints', 'showlegend', 'stream', 'text', 'textangle', 'textfont', 'textposition', 'textpositionsrc', 'textsrc', 'texttemplate', 'texttemplatesrc', 'transforms', 'type', 'uid', 'uirevision', 'unselected', 'visible', 'width', 'widthsrc', 'x', 'x0', 'xaxis', 'xcalendar', 'xhoverformat', 'xperiod', 'xperiod0', 'xperiodalignment', 'xsrc', 'y', 'y0', 'yaxis', 'ycalendar', 'yhoverformat', 'yperiod', 'yperiod0', 'yperiodalignment', 'ysrc', 'key', 'set', 'frame', 'transforms', '_isNestedKey', '_isSimpleKey', '_isGraticule', '_bbox'
Summary: We can conclude that Townsville saw a maximum number of shark catches at a count of 112.
Significance Level : By convention, we use \(\alpha = 0.05\).
Limitations :
\(H_0:\) There is no correlation
\(H_A:\) There is a correlation.
We now calculate the p-value:
cor.test(Sharks$Length..m., Sharks$Water.Temp..C.)
##
## Pearson's product-moment correlation
##
## data: Sharks$Length..m. and Sharks$Water.Temp..C.
## t = -2.1597, df = 530, p-value = 0.03124
## alternative hypothesis: true correlation is not equal to 0
## 95 percent confidence interval:
## -0.177007457 -0.008459799
## sample estimates:
## cor
## -0.09340278
Conclusion: As the p-value is 0.03124 which is less than 0.5 thus we reject the null hypothesis which means that the alternative hypothesis is true which states that there is a correlation between the length of the sharks caught and the temperature of the water.
Shark Control Program Shark Catch Statistics by year - 2001–2016 - Shark Control Program catch statistics - Open Data Portal | Queensland Government. (2016). Qld.gov.au. https://www.data.qld.gov.au/dataset/shark-control-program-shark-catch-statistics/resource/5c6be990-3938-4125-8cca-dac0cd734263
Gary, S. (2010, November 4). Are we seeing more sharks than usual at this time of year? Www.abc.net.au. https://www.abc.net.au/science/articles/2010/11/04/3056893.htm
Zach. (2021, November 17). The Five Assumptions for Pearson Correlation - Statology. Statology. https://www.statology.org/pearson-correlation-assumptions/